Lag0s

Week Summary

Technology

Earth has captured a temporary 'second moon,' a small asteroid named 2024 PT5, which will orbit until November 2024.

Research indicates that larger AI chatbots are increasingly prone to generating incorrect answers, raising concerns about their reliability.

Meta's Chief Technical Officer discussed advancements in AR and VR technologies, particularly focusing on the Orion AR glasses.

The author reflects on their experience with Rust, proposing several changes to improve the language's usability and safety features.

The Tor Project and Tails OS have merged to enhance their efforts in promoting online anonymity and privacy.

OpenAI is undergoing leadership changes, with key executives departing amid discussions about restructuring and the company's future direction.

Git-absorb

The concept of critical mass explains how significant changes occur when a threshold of acceptance is reached, impacting technology and society.

WordPress.org has banned WP Engine from accessing its resources due to ongoing legal disputes, raising concerns about security for WP Engine customers.

PostgreSQL 17

Hotwire Native is a web-first framework that simplifies mobile app development, allowing developers to reuse HTML and CSS across platforms.

Radian Aerospace is progressing on a reusable space plane, completing ground tests and aiming for full-scale flights by 2028.

A groundbreaking diabetes treatment using reprogrammed stem cells has enabled a patient to produce insulin independently for over a year.

Apple is developing a new home accessory that combines features of the iPad, Apple TV, and HomePod, expected to launch in 2025.

SpaceX's Starlink service is set to surpass 4 million subscribers, reflecting rapid growth and significant revenue projections.

TinyJS is a lightweight JavaScript library that simplifies dynamic HTML element creation and DOM manipulation for developers.

Exploring Web Scraping with JavaScript in Browsers
Web scraping has become a popular method for extracting data from websites, traditionally dominated by languages like Python and libraries such as Beautiful Soup. However, there is a growing interest in utilizing JavaScript directly within web browsers for this purpose. While many tutorials focus on Python, there is a notable lack of resources for web scraping using JavaScript in a browser environment, despite the potential for doing so. Historically, web scraping predates the modern capabilities of JavaScript, which was introduced in 1995 but only matured into a versatile language in the following years. Python, being a more established language from its inception, has remained the go-to choice for many developers. Although Node.js has gained traction, tools for web scraping in browsers have not been widely developed. One significant challenge in browser-based web scraping is CORS (Cross-Origin Resource Sharing), which governs how resources can be accessed by JavaScript. To navigate these restrictions, developers often resort to using browser extensions or proxy servers. While extensions can be limited by security protocols, proxy servers offer more flexibility but require additional setup. Python's approach to web scraping bypasses these CORS limitations since it operates outside the browser environment, although it may necessitate extra support for parsing HTML or executing JavaScript. As web technologies have evolved, scraping has become more complex, leading to the use of headless browsers like Puppeteer or Selenium. These tools allow developers to control a browser without a graphical interface, simulating user interactions. However, this raises the question of whether it might be more efficient to write web scrapers directly in the browser. Creating a web scraper in the browser is indeed possible and can be accomplished with relatively few lines of code. The browser is inherently designed to parse various data structures, including HTML and JSON, making it a suitable environment for scraping tasks. After scraping, the browser can also be used to visualize the data, either through a simple display or a more complex application. To illustrate this, a simple web scraping template can be created using JavaScript. The template includes an input field, buttons, and a display area for results. The core scraping functionality can be encapsulated in an asynchronous function that fetches data from a specified URL, processes it, and extracts relevant information, such as video titles from a playlist. Additionally, converting HTML to a DOM can enhance the scraping process, allowing for more complex data extraction. By using the DOMParser API, developers can create a DOM from the fetched HTML, making it easier to query and manipulate the data. Despite the apparent simplicity of scraping with a browser, many developers overlook this approach, often due to the influence of established practices and tools in the industry. While the browser may not be the ideal solution for large-scale scraping operations, it offers a practical method for personal projects, particularly for extracting specific types of data, such as video links. For more advanced scraping tasks, such as those involving sites like YouTube or Cloudflare-protected resources, setting up a local proxy server becomes essential. This allows for more robust handling of requests and responses, including the ability to manage cookies and headers effectively. In conclusion, web scraping with a browser presents a viable alternative to traditional methods, especially for smaller projects or personal use. By leveraging the capabilities of modern browsers and understanding the necessary configurations, developers can create efficient and effective web scrapers without relying on extensive third-party tools. For those interested in exploring this further, setting up a local proxy server and experimenting with the techniques discussed can open up new possibilities in web scraping.
N/A N/A N/A N/A Web Scraping

Month Summary

Technology

OpenAI is considering a new subscription model for its upcoming AI product, Strawberry, while also restructuring for better financial backing.

Telegram founder

The startup landscape is shifting towards more tech-intensive ventures, with a focus on specialized research and higher capital requirements.

Boom Supersonic's XB-1 demonstrator aircraft successfully completed its second flight, testing new systems for future supersonic travel.

announced the uncrewed return of Boeing's Starliner, with future crewed missions planned for 2025.

OpenAI's SearchGPT aims to compete with Google Search by providing AI-driven information retrieval, though it currently faces accuracy issues.

Tesla is preparing to unveil its autonomous robotaxi technology at an event in Los Angeles, indicating ongoing challenges in achieving full autonomy.

The US Department of Justice is investigating Nvidia for potential antitrust violations related to its AI chip market dominance.

Apple plans to use OLED screens in all iPhone 16 models, moving away from Japanese suppliers and introducing new AI features.

Amazon S3 has introduced conditional writes to prevent overwriting existing objects, simplifying data updates for developers.

Chinese scientists have developed a hydrogel that shows promise in treating osteoarthritis by restoring cartilage lubrication.

Nvidia's CEO is working to position the Nvidia as a comprehensive provider for data center needs, amidst growing competition from AMD and Intel.

OpenAI

Nvidia Blackwell

Amazon is set to release a revamped Alexa voice assistant in October, powered by AI models from Anthropic's Claude, and will be offered as a paid subscription service.